If you get an error go to the packages panel, click install and type the name of the library (you only need to do this once).
library(fpp2)
library(tidyverse)
library(TSstudio)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
\[X_t = A(\omega) \sin\left(\frac{2 \pi t}{500}\right) \]
where \(A(\omega)\) is uniformly distributed in \([0, 3]\) We create T samples (time index 1, 2, …, T)
T <- 500
and we generate k different realizations of the stochastic process.
k <- 5
We set the random seed to ensure reproducibility
set.seed(2024)
Now we generate a sample of k values of \(A(\omega)\), one for each realization.
A <- runif(k, min = 0, max = 3)
A
## [1] 2.5108276 0.9626026 2.0410900 2.0945194 1.3710276
Note: the value of \(\omega\) is invisible! What we get is \[A(\omega_1), A(\omega_2), \ldots, A(\omega_k)\] And we use them to create those realizations. First we create a matrix such that each column will be a realization.
X = matrix(nrow = T, ncol = k)
The time index
idx <- 1:T
is used in a for loop to fill up the matrix
for(i in 1:k){
X[, i] = A[i] * sin(2 * pi * idx/500)
}
The filled matrix has dimension looks like this
dim(X)
## [1] 500 5
and the first rows look like this
head(X)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.03155116 0.01209610 0.02564842 0.02631981 0.01722839
## [2,] 0.06309734 0.02419030 0.05129279 0.05263547 0.03445406
## [3,] 0.09463355 0.03628067 0.07692905 0.07894282 0.05167428
## [4,] 0.12615482 0.04836531 0.10255317 0.10523770 0.06888635
## [5,] 0.15765617 0.06044232 0.12816110 0.13151596 0.08608754
## [6,] 0.18913262 0.07250978 0.15374879 0.15777345 0.10327513
Let us convert the matrix to a (multi) time series and plot it:
X <- ts(X)
ts_plot(X)
A fixed row of the matrix is a sample of the \(X_t(\omega)\) random variable. For example with \(t = 300\) we are looking at the 300th row, What is the distribution of this random variable? (Change k to 100 to create many more realizations and uncomment the code below)
# boxplot(X[300, ], horizontal = TRUE)
# stripchart(X[300, ], add = TRUE, method = "jitter",
# jitter = 0.1, col="red", pch = "*")
We will create a gaussian white noise time series. In order to do that we get a sample of \(n\) random values from a standard normal \(Z \sim N(0, 1)\).
n <- 150
z <- rnorm(n, mean = 0, sd = 1)
# 0 and 1 are the default values for rnorm so this is equivalent to rnorm(n)
head(z, 30)
## [1] 0.52848984 -0.51519285 -1.17975801 1.67576553 0.03957866 1.00367234
## [7] 1.09487912 -0.53759950 -1.21599973 1.03642690 -0.54665708 0.89866055
## [13] -2.03101135 -0.55253852 1.93296014 0.47037026 1.82659804 0.37822290
## [19] 1.34357616 -0.27426002 -0.13026142 -0.58512271 0.38662495 0.64670968
## [25] -0.02634722 0.57058484 -0.87083217 -0.58818511 -0.18505933 0.35688267
Let us now use this to define a ts time series object.
Note that now we are not providing the frequency,
start, etc. In this case, the ts function will create a
time index using the natural numbers \(t = 1,
2, 3, \ldots\).
w <- ts(z)
head(w, 25)
## Time Series:
## Start = 1
## End = 25
## Frequency = 1
## [1] 0.52848984 -0.51519285 -1.17975801 1.67576553 0.03957866 1.00367234
## [7] 1.09487912 -0.53759950 -1.21599973 1.03642690 -0.54665708 0.89866055
## [13] -2.03101135 -0.55253852 1.93296014 0.47037026 1.82659804 0.37822290
## [19] 1.34357616 -0.27426002 -0.13026142 -0.58512271 0.38662495 0.64670968
## [25] -0.02634722
And if we make a time plot we can see that the result is really noisy:
autoplot(w) +
ggtitle("White noise") +
geom_point(aes(x = 1:n, y = z), size=1.5, col="blue")
White noise series are our basic example of unpredictable. But they are very important, in particular for these two reasons:
A random walk is an stochastic process usually defined by the recursive equation: \[y_t = k + y_{t-1} + w_t\] where \(w_t\) is white noise (the random walk is gaussian if \(w_t\) is gaussian). The value \(k\) is the drift constant and a random walk with \(k \neq 0\) is a random walk with drift. In the above expression computing \(y_1\) requires knowledge of \(y_0\), so we set \(y_0 = 0\) (equivalently \(y_1 = k + w_1\)).
With this assumption, an equivalent definition of random walk is:
the sum of a linear trend term and the cumulative sum of the white
noise process:
\[y_t = k\cdot t + \displaystyle\sum_{i =
1}^t w_i \]
Let us simulate n values of a random walk with drift:
n = 1000
set.seed(2024)
Let k be the drift constant
k = 0.1
First we create the white noise time series values:
w = 10 * rnorm(n)
and then
rw_ts = ts(k * (1:n) + cumsum(w))
Now if we look at the time plot, it does not look like noise any more!
autoplot(rw_ts) +
ggtitle("Random walk with drift")
Note: compare this to the Google’s daily closing stock price series in 2015 (Figure 5.8 in Section 5.2 of Hyndman, FPP3)
The reason behind this is that the recursive relation \(y_t = k + y_{t-1} + w_t\) creates a correlation structure between the values of the time series. The random variables \(y_t\) and \(y_t-1\) are no longer uncorrelated.
If we plot the original series and one of the first lags this correlation is apparent (to enhance the visualization we only plot the first 100 values):
rw_ts_lag <- stats::lag(rw_ts, k=-5)
Note the Start value of the lagged series.
head(rw_ts_lag, 20)
## Time Series:
## Start = 6
## End = 25
## Frequency = 1
## [1] 9.9196941 14.7068445 13.7271312 11.6983495 23.3793341 36.4028825
## [7] 41.8493540 40.6790068 28.5302526 17.4167396 0.7974894 5.5866687
## [13] 14.0085677 17.0387838 -15.6040744 -16.9851820 -26.1845693 -21.5424258
## [19] -37.5034982 -57.5182115
autoplot(head(rw_ts, 100), color="blue", alpha = 0.5) +
autolayer(head(rw_ts_lag, 100), color="red", alpha = 0.5)
Recall that the autocorrelation function of \(y_t\) is \[ \rho(s, t) = \dfrac{\gamma(s, t)}{\sqrt{\gamma(s, s)\,\gamma(t, t)}}, \] where \(\gamma(s, t) = \operatorname{cov}(x_s, x_t)\) is the autocovariance function. We can visualize the correlations between \(y_t\) and the first lags \(y_{t - k}\) for \(k = 1, 2, 3,\ldots\) using lag plots:
gglagplot(rw_ts, do.lines = FALSE, continuous = FALSE)
Let us plot the ACF for the random walk with drift:
ggAcf(rw_ts)
In the ACF plot the vertical bar at Lag \(=k\) represents the correlation of \(y_t\) with the \(k\)-lagged time series\(y_{t - k}\). To get the numerical values we can use:
acf(rw_ts, lag.max = 10, plot = FALSE)
##
## Autocorrelations of series 'rw_ts', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 0.996 0.991 0.987 0.983 0.979 0.976 0.972 0.968 0.965 0.961
The dotted horizontal blue lines in the ACF are a visual help about a hypothesis test about the significance of the autocorrelation values. If the autocorrelation extends beyond the blue line, it is considered significant. In this case the pattern of very slow decay in the ACF is a clear indicator of the non-stationary character of this process. In fact, even the mean depends on \(t\): \[\mu_{rw} = k\cdot t\] and even if the drift is 0, the autocovariance \(\gamma(s, t)\) always depends on \((s, t)\).
The situation for the random walk is in sharp contrast with the ACF of the white noise:
ggAcf(w)
acf(w, lag.max = 10, plot = FALSE)
##
## Autocorrelations of series 'w', by lag
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1.000 -0.021 -0.038 0.024 0.016 -0.030 -0.015 0.002 -0.023 0.019 -0.032
And for the autocovariance of a white noise process \(w_t\) we get \[\gamma_{w}(s, t) = \begin{cases} \sigma_{w}^2& s = t\\[3mm] 0 & s \neq t \end{cases}, \] where \(\sigma_{w}\) is the standard deviation of \(w_t\). The autocorrelation is therefore 1 for \(s = t\) and 0 otherwise. For a more mathematically inclined (and R based) introduction to these topics we recommend Chapters 1 and 2 of Shumway and Stoffer (2017)